Method for the preparation of insulin by cleavage of a fusion protein and fusion proteins containing insulin A and B chains

ABSTRACT

The invention provides a protein precursor of a double-chain molecule having at least two polypeptide chains linked by two disulfide bonds. The precursor has the general formula B—Z—A wherein B and A are the two insulin polypeptide chains, and Z is a polypeptide comprising at least one site for proteolytic cleavage in a eukaryotic host cell transformed by DNA coding for the protein precursor or in a eukaryotic cell-free system, resulting in a secreted double-chain product consisting of the disulfide-bonded A and B chains in which a portion of the Z polypeptide is retained on the A chain and/or the B chain and can be removed therefrom by enzymatic or chemical agents in vitro. Preferably the B and A polypeptides are, respectively, the B- and A-chains of insulin linked by two disulfide bonds, and the mature insulin molecule is isolatable from the secreted double-chain product. Preferably, the retained portion of the Z polypeptide comprises an affinity polypeptide tag for isolation and purification of the double-chain product. The Z polypeptide may further comprise an additional isolatable polypeptide of interest. The invention provides DNA sequences encoding the protein precursor, organisms transformed and transfected therewith and methods for preparing insulin from the protein precursor.

The present invention concerns double-chain disulfide-bonded molecules,particularly insulin, and precursor molecules for same, together withDNA sequences coding for same, processes for preparation of saidprecursors, and processes for the preparation of the molecule.

Human insulin is a non-steroidal hormone comprising two polypeptidechains (A and B); the A-chain comprising 21 amino acid residues (A¹⁻²)and the B-chain comprising 30 amino acid residues (B¹⁻³⁰) The A- andB-chains are joined by two intermolecular disulfide bridges. A thirdintramolecular disulfide bridge is formed within the A-chain.

Human insulin is naturally produced in the pancreas by the β-cells ofthe islets of Langerhans, via. a single 110 amino acid precursorpolypeptide (preproinsulin) (Chan, S. J. et al., 1976, Proc. Natl. Acad.Sci. USA, 73: 1964-1968; Sheilds and Blobel, 1977, Proc. Natl. Acad.Sci. USA, 74: 2059-2063) with a structure of:

(NH₂) pre-peptide-B-chain-C-peptide-A-chain (COOH)

The human preproinsulin (precursor) undergoes various post-translationalmodifications and events to convert it into mature insulin. The firststep is removal of the prepeptide (Bell, G. I. et al., 1979, Nature 282:525-527), which acts as a signal sequence to direct the molecule(proinsulin) upon synthesis into the endoplasmic reticulum (ER) andhence into the secretory pathway. After entry into the ER, the resultantproinsulin then folds and the three disulfide bridges are formed (Chanet al., 1976, supra; Lomedico, P. T. et al., 1977, J. Biol. Chem., 259:7971-7978; Shields and Bloebel, 1977 supra). The proinsulin then passesto the Golgi, is packaged into secretory granules and is converted intomature insulin by endoproteolytic cleavage (Steiner, D. F. et al., 1984,J. Cell. Biol., 24: 121-130; Tager and Steiner, 1974, Ann. Rev.Biochem., 43: 509-538).

Since the discovery of insulin in 1921, the nature of insulinpreparations used to treat diabetics has shown a steady evolution(Owens, D. R., 1986, Human Insulin, pp 5-33 MTP Press). A human sourceof insulin has always been impractical due to low yields from thepancreas and degradation. However, the structure of insulin is highlyconserved in other mammals, making it possible to use other animals as asource of insulin. This has led to the development of porcine and bovineinsulins. However, they are difficult to manufacture, great care havingto be taken to ensure purity and to minimise their allergic response.

Latterly, recombinant DNA methods have allowed the synthesis of variousforms of recombinant human insulin. This has been achieved using E. coliand Saccharomyces cerevisiae. Early techniques involved the productionof separate A- and B-chains (Goeddel, D., et al., 1979, Proc. Natl.Acad. Sci. USA, 76: 106-110; Chance, R. E. et al., 1981, In: Rich, D. H.& Gross, E. (eds.) Peptides: Synthesis-Structure-Function, Proc. SeventhAmerican Peptide Symposium, pp 721-728, Rockford II, Pierce ChemicalCo.; Frank, B. H. et al., 1981, In. Rich, D. H. & Gross, E. (eds.)Peptides: Synthesis-Structure-Function, Proc. Seventh American PeptideSymposium, pp 729-738. Rockford II, Pierce Chemical Co.; Steiner, D. F.,et al., 1968, Proc. Natl. Acad. Sci. USA, 60: 622; and EP-A-0 090 433).

However, these procedures, all of which require the chemical combinationof the A- and B-chains, have several serious drawbacks. One is that thefusion proteins accumulate intracellularly and are subject toproteolytic degradation. They must all be purified from the otherintracellular materials, and E. coli materials are pyrogenic.Additionally, chemical dusulfide bond formation is inefficient.

An alternative approach has been to produce insulin from eukaryoticcells and utilise the secretion pathway to modify precursor insulin intothe mature form as happens in the pancreatic β-cells and also to secretethe product into the culture medium, away from the intracellularproteins, where there are few contaminants from which it needs to bepurified. Examples of such work include EP 0 121 884 A; EP 0 195 691 A;Wollmer, A., et al., 1974, Hoppe-Seyler's Z. Physiol. Chem., 355:1471-1476; Brandenburg, D. et al., 1973, Hoppe-Seyler's Z. Physiol.Chem., 354:1521-1524; Thim, L., et al., 1986, Proc. Natl. Acad. Sci.USA, 83: 6766-6770; Thim, L., et al., 1987, FEBS, Let 212: 307-312; EP 0163 529 A; EP 0 427 296 A; Markussen, J., et al., 1986, (In Peptides,1986, Theodoropoulos, D., (ed.) pp. 189-194, Proc. 19th Eur. PeptideSymp. on Peptide, Porto Carras-Chalkidiki, Greece. Walter de Gruyter &Co, New York) and EP 0 347 845 A. FIGS. 1 and 2 show a mini-proinsulin(Thim, L. et al., 1986, supra).

However, these are unable to give high yields of mature insulin ornear-mature insulin, instead being primarily concerned with producinghigh levels of insulin precursors (for example, insulin precursors withthe carboxy-terminus residue of the B-chain (B₃₀) missing) whichsubsequently require costly and extensive chemical alteration in orderto convert them into mature insulin. The present invention overcomes thelimitations and disadvantages of the prior art and provides simple,convenient and economic double-chain molecules and precursor molecules,in particular insulin, together with DNA sequences coding for same,processes for preparation of said precursors, and processes for thepreparation of insulin and insulin analogues.

According to the present invention there is provided a protein precursorfor at least two polypeptide chains having the general formula B-Z-Awherein B and A are the two polypeptide chains of a double-chainmolecule, the two chains being linked by at least one disulfide bond,and Z is a polypeptide comprising at least one proteolytic cleavagesite.

The precursor may be produced in a host. By ‘host’ is meant a systemwhich is capable of producing the protein precursor of the presentinvention. The host may be cells of a single- or multi-cellularorganism, or it may be a cell-free system. For example, the host may beeukaryotic. It may be yeast or fungal cells or it may be an animal, forexample sheep, rat or mouse, or it may be a cell-line from an animal.

Alternatively, the precursor may be produced in a cell-free host system.

Proteolytic cleavage of Z may produce the double-chain molecule,possibly in its mature form, or a near precursos thereof.

The double-chain molecule may be insulin, the B and A polypeptidesrepresenting, respectively, the B- and A-chains of insulin.

Insulin may for example be human, bovine or porcine insulin or apartially modified form thereof For example, modification may be by wayof addition, deletion or substitution of amino acid residues.Substitutions may be conserved substitutions. Modification of humaninsulin to produce porcine insulin may be achieved by substitution ofalanine at residue B₃₀. Bovine insulin may be produced from humaninsulin by substitution of alanine at residue B₃₀, of alanine at A₈ andof valine at A₁₀. Partially modified forms of molecules (comprisingamino acid residues or nucleic acids) may be considered to be homologuesof the molecules from which they were derived. The may have at least 50%homology with the molecules from which they were derived. They may forexample have at least 60, 70, 80, 90 or 95% homology.

The present inventors have found that, surprisingly, despite theproblems associated with the prior art synthesis of recombinant insulin,mature insulin and near-precursors of insulin may be produced in vivo inorganisms such as yeast using genetic constructs, the mature insulinresulting from post-translational processing of the precursor molecule.Moreover, these insulin molecules may be produced at high yield byyeast, making the present invention an economically viable alternativeto the present methods of synthesising insulin.

The polypeptide Z may also comprise at least one additional polypeptide.Hence not only may a double-chain molecule such as insulin be produced,but an additional molecule or molecules, which may also requirepost-translational processing, may be produced.

The polypeptide Z may also comprise a purification sequence. Thepurification sequence may, for example, bind to heparin and/orphosvitin. The purification sequence may be a sequence which isrecognised and bound by another molecule. This allows the purificationsequence, and therefore the rest of the protein precursor, to be readilypurified from a mixture which may contain various contaminants. Themixture may, for example, be a cell lysate.

The polypeptide Z may be of the general formula (I): KR-X-KR or ananalogue thereof wherein K is lysine, R is arginine and X represents achain of amino acid residues sufficient in length to facilitate cleavagein a host at the KR residues and eliminate processing losses. Analoguesmay of course include polypeptide Z having residues other than K and Rwhich facilitate cleavage at the residues. Such cleavage sites,sequences and endopeptidases for achieving cleavage are well known.

Such a protein precursor may for example have the formula of Ins3 (FIG.8; SEQ ID NO: 1) or a partially modified form thereof.

The polypeptide Z may be of the general formula (II): KR-X-M or ananalogue thereof wherein K is lysine, R is arginine, M is methionine andX represents a chain of amino acid residues sufficient in length tofacilitate cleavage in a host at the KR residue.

Such a protein precursor may for example have the formula of Ins4 (FIG.9; SEQ ID NO: 2), Ins6 (FIG. 13; SEQ ID NO: 3) and Ins7 (FIG. 15; SEQ IDNO: 4) or a partially modified form thereof.

Alternatively, such a protein precursos could be for porcine insulin andhave the sequence of Ins8 (FIG. 16; SEQ ID NO: 7) or Ins9 (FIG. 16; SEQID NO: 8). Similarly, it could be for bovine insulin and have thesequence of Ins10 (FIG. 17; SEQ ID NO: 9).

The polypeptide Z may be of the general formula (III): KR-Pur-M or ananalogue thereof wherein K is lysine, R is arginine, Pur is apurification sequence, and M is methionine.

Treatment of such a protein precursor with for example cyanogen bromidemay both cleave off the Pur purification sequence and simultaneouslyproduce the mature double-chain molecule or a near precursor thereof.

By ‘near presursor thereof’ is meant a precursor of the double-chainmolecule which may be simply converted into its mature state by, forexample, treatment with a protease or proteases. For example, thedouble-chain molecule may be insulin, a near precursor being convertedto mature insulin by treatment with carboxypeptidase B alone or bytrypsin plus carboxypeptidase B.

Such a protein precursor may have the formula of Ins7 (FIG. 15; SEQ IDNO: 4) or a partially modified form thereof.

The polypeptide Z may be of the general formula (IV): KR-Y-M or ananalogue thereof (for example having substitutions at K, R or M) whereinK is lysine, R is arginine, Y is a second polypeptide, and M ismethionine.

Treatment of such a protein precursor with for example cyanogen bromidemay produce the mature double-chain molecule and release the secondpolypeptide.

In such a protein precursor, Y may be a c-myc peptide sequence, theprecursor having the formula of Ins4 (FIG. 9; SEQ ID NO: 2) or apartially modified form thereof.

The polypeptide Z may be of the general formula (V): KR-Y-N-Pur-M or ananalogue thereof wherein K is lysine, R is arginine, Y is a secondpolypeptide, N is methionine or aspartic acid, Pur is a purificationsequence, and M is methionine.

N may be methionine and treatment with for example cyanogen bromide maycause cleavage of the purification sequence from the second polypeptide.

N may be aspartic acid and treatment with for example Pseudomonas fragimutant Me1 endopeptidase may cause cleavage of the purification sequencefrom the second polypeptide.

Y may for example be a c-myc peptide sequence, the purification sequencePur binding specifically to heparin and phosvitin, the precursor havingthe formula of Ins6 (FIG. 13; SEQ ID NO: 3) or a partially modified formthereof.

The polypeptide Z may be of the general formula (VI): N-X-KR or ananalogue thereof wherein N is methionine or aspartic acid, K is lysine,R is arginine, and X is a chain of amino acid residues sufficient inlength to facilitate cleavage in a host at the KR residues.

The chain X of amino acid residues may comprise a purification sequenceand/or a second polypeptide.

Such a protein precursor may have the formula of Ins2 (FIG. 5; SEQ IDNO: 5) or of Ins5 (FIG. 11; SEQ ID NO: 6) or a partially modified formthereof.

Additionally, a protein precursor according to the present invention mayalso comprise a leader peptide which directs the protein precursor intothe secretion pathway of a host.

In analogues of formulae (I)—(VI), amino acid residues K, R, M and N maybe substituted with alternative residues which still allow theproduction of the desired end-product or products. For example,substitution of KR could be for a sequence which is proteolyticallycleaved by an endopeptidase.

Also provided according to the present invention are DNA sequencesencoding the protein precursors of the present invention.

Such a DNA sequence may be adapted to a host wherein the codons of theDNA sequence correspond to the most abundant transfer RNAs for eachamino acid in the host.

Such a DNA sequence may be selected from any one of the group of Ins2(FIG. 5; SEQ ID NO: 10), Ins3 (FIG. 8; SEQ ID NO: 11), Ins4 (FIG. 9; SEQID NO: 12), Ins5 (FIG. 11; SEQ ID NO: 13), Ins6 (FIG. 13; SEQ ID NO:14),Ins7 (FIG. 15; SEQ ID NO: 15), Ins8 (FIG. 16; SEQ ID NO: 16), Ins9(FIG. 16; SEQ ID NO: 17) and Ins 10 (FIG. 17; SEQ ID NO: 18) or apartially modified form thereof For example, modifications may be by wayof substitution of nucleic acid bases, the substituted sequencesencoding the same amino acid sequence. Partially modified forms of DNAsequences may therefore be considered to be analogues of the sequencesfrom which they were derived. Modified sequences may for example be theaddition of transcription control sequences.

Also provided are DNA sequences according to the present invention whentransfected or transformed into a host organism.

Also provided are host organisms transfected or transformed with a DNAsequence according to the present invention.

Methods of transfection and transformation are well known in the art andtransgenic organisms may be readily produced.

Also provided are methods of production of a double-chain molecule or anear precursor thereof comprising expressing a DNA sequence according tothe present invention in a host. Such a double-chain molecule may, forexample, be insulin.

Such a method may comprise transforming or transfecting a host organismwith an expression vector expressing a DNA sequence according to thepresent invention.

Such a method of production may comprise transforming the host organismwith an expression vector encoding a protein precursor wherein Z is ofthe general formula (I), cultivating the transformed host in a suitableculture medium, recovering the secreted product or products andconverting any near precursor of insulin into insulin by teatment withcarboxypeptidase B alone or by teatment with trypsin andcarboxypeptidase.

Alternatively, such a method of production may comprise transforming thehost organism with an expression vector encoding a protein precursorwherein Z is of the general formula (II), cultivating the transformedhost in a suitable culture medium, recovering the secreted product andconverting it to mature insulin by cleavage at the methionine residuewith cyanogen bromide treatment in order to remove the chain of aminoacid residues X.

Alternatively, such a method of production may comprise transforming thehost organism with an expression vector encoding a protein precursorwherein Z is of the general formula (III), cultivating the transformedhost in a suitable culture medium, recovering the secreted product viaaffinity-chromotography via the purification sequence Pur and convertingit to mature insulin by cleavage at the methionine residue with cyanogenbromide treatment in order to remove the chain of amino acid residues X.

Alternatively, such a method of production may comprise transforming thehost organism with an expression vector encoding a protein precursorwherein Z is of the general formula (IV), cultivating the transformedhost in a suitable culture medium, recovering the secreted product andconverting it to mature insulin and releasing the second polypeptide Yby cleavage at the methionine residue with cyanogen bromide treatment.

Alternatively, such a method of production may comprise transforming thehost organism with an expression vector encoding a protein precursorwherein Z is of the general formula (V), cultivating the transformedhost in a suitable culture medium, recovering the secreted product viaaffinity-chromotography via the purification sequence Pur, converting itto mature insulin by cleavage at the methionine residue with cyanogenbromide treatment in order to remove the chain of amino acid residues Xand releasing the second polypeptide Y by cleavage at the residue N.

Alternatively, such a method of production may comprise transforming thehost organism with an expression vector encoding a protein precursorwherein Z is of the general formula (VI), cultivating the transformedhost in a suitable culture medium, recovering the secreted product,converting it to mature insulin by cleavage at the aspartic acid residueby Pseudomonas fragi Me1 endopeptidase treatment in order to remove thechain of amino acid residues X. The chain of amino acid residues X maycomprise at least either a purification sequence or a secondpolypeptide.

The culture medium may for example be a malt-extract-cassamino acidsculture medium.

The invention will be further apparent from the following descriptionand figures which describe, by way of example only, various forms ofprotein precursor. Of the figures:

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates the construction of Ins 1 from 4 chemicallysynthesized oligonucleotides.;

FIG. 2 illustrates the DNA sequence encoding the Insl product. The aminoacid and DNA sequences are SEQ ID NOs: 42 and 43 respectively;

FIG. 3 illustrates the construction of Ins 2 precursors from 4chemically synthesized oligonucleotides. The 9E10 epitope is residues33-42 of SEQ ID NO: 5. The 32-mer is residues 96-127 of SEQ ID NO: 10.The 40-mer is the complementary sequence to residues 92-131 of SEQ IDNO: 10.

FIG. 4 illustrates the final derivation of Ins2 via in vitrorecombination of two precursors containing errors. Ins2-16 has 1 basemissing in the C region Lys-Arg. Ins2-18 has an error in the B-chain.Ins2-16/18 is constructed by in vitro recombination at the SalI site ofthe sequence correct Ins2-16 ‘B arm’ with the Ins2-18 sequence correct‘A arm’;

FIG. 5 illustrates the DNA sequence (SEQ ID NO: 10) encoding the Ins2product (SEQ ID NO: 5);

FIG. 6 illustrates the PCR mutagenesis scheme used to used to alterIns-encoding sequences. Internal primers contain mismatches to theinsulin sequence to introduce base changes during PCR;

FIG. 7 illustrates the primers used to derive Ins3 (primer SEQ ID NOs:28 and 29) and Ins4 (primer SEQ ID NOs: 30 and 31) via in vitromutagenesis. Ins3 was made by PCR mutagenesis of Ins2 using the generalstrategy and internal primers as shown. Ins4 was made by PCR mutagenesisof Ins3, using the general strategy and internal primers as shown;

FIG. 8 illustrates the DNA sequence (SEQ ID NO: 11) encoding the Ins3product (SEQ ID NO: 1);

FIG. 9 illustrates the DNA sequence (SEQ ID NO: 12) encoding the Ins4product (SEQ ID NO: 2);

FIG. 10 illustrates the derivation of Ins5 from Ins2 via insertion of adouble-stranded synthetic oligonucleotide (Pur) into the SalI site. ThePur sequences form part of SEQ ID NOs: 6 and 13 (Ins5);

FIG. 11 illustrates the DNA sequence (SEQ ID NO: 13) encoding the Ins5product (SEQ ID NO: 6);

FIG. 12 illustrates the derivation of Ins6 from Ins4 via insertion of adouble-stranded synthetic oligonucleotide (Pur) into the SalI site. ThePur sequence is residues 43-60 of SEQ ID NO: 3 and correspondingnucleotides of SEQ ID NO: 14;

FIG. 13 illustrates the DNA sequence (SEQ ID NO: 14) encoding the Ins6product (SEQ ID NO: 3).

FIG. 14 illustrates the general strategy for the derivation of Ins7 fromIns6 via PCR deletion of the 9E 10 sequence with overlapping primershaving SEQ ID NOs: 32 and 33;

FIG. 15 illustrates the DNA sequence (SEQ ID NO: 15) encoding the Ins7product (SEQ ID NO: 4);

FIG. 16 illustrates the structure of porcine insulin precursor Ins8derived from Ins4 and the residue of Ins4 which undergoes PCRmutagenesis, together with the structure of porcine insulin precursorIns9 derived from Ins6 and the residue of Ins7 which undergoes PCRmutagenesis. Primers (SEQ ID NOs: 34 and 35) used to derive Ins8 andIns9 via in vitro mutagenesis are also shown. Ins 8 has amino acid andDNA sequences SEQ ID NOs: 7 and 16. Ins 9 has amino acid and DNAsequences SEQ ID NOs: 8 and 17;

FIG. 17 illustrates the general structure of bovine insulin precursorIns10 derived from Ins9 and the residues of Ins9 which undergoes PCRmutagenesis. Primers (SEQ ID NOs: 36 and 37) used to derive Ins10 (aminoacid and DNA sequences SEQ ID NOs: 9 and 18) via in vitro mutagenesisare also shown;

FIG. 18 illustrates the yeast expression-secretion vectors used. TheHindIII cleavage site is amino acid and DNA sequences SEQ ID NOs: 38 and39; and

FIG. 19 illustrates the MFα secretion leader/ synthetic insulin codingsequence fusions and proteolytic processing sites. The pDP314 amino acidand DNA sequences are SEQ ID NOs: 40 and 41.

EXPERIMENTAL 1. Design and Preparation of Synthetic RecombinantInsulin-encoding Sequences

The insulin-encoding sequences were sufficiently small to permit theirconstruction via chemical synthesis. This had the advantage over usinghuman proinsulin cDNA, as used by others, as the synthetic DNA could bedesigned to contain exclusively codons most highly favoured forexpression in yeast. This had the advantages of facilitated maximumtranslation efficiency (Hoekema, A., et al., 1987, Mol. Cell. Biol. 7:2914-2924; Hadfield, C., et al., 1993, Mycol. Res. 97: 897-944) and ofminimizing possible amino acid misincorporation errors (Scorer, C. A.,et al., 1991, Nucleic Acids Res. 19: 3511-3516). Accordingly, theinsulin encoding sequences were synthesized from oligonucleotidescontaining codons that corresponded to the most abundant yeast transferRNAs for each encoded amino acid (Sharp & Cowe, 1991, Yeast 7: 657-678):

Phe UUU/UUC Leu UUG (UUA) Ile AUU Met AUG Val GUU Ser UCU (UCA) Pro CCAThr ACU (ACA) Tyr UAU/UAC Ala GCU (GCC/GCA) His CAU (CAC) Gln CAA AsnAAU (AAC) Lys AAA/AAG Asp GAU (GAC) GIu GAA Cys UGU Trp UGG Arg CGU SerAGU Gly GGU

Stop: UAA UAA—used in tandem to ensure translation termination.

where/indicates either codon and () contains a second choice codon lessfrequently used in yeast.

The oligonucleotides were synthesized as single strands, which wereannealed together afterwards to form double-stranded DNA. Two initialinsulin precursors encoding sequences were prepared in this manner,termed Ins1 and Ins2 (FIGS. 1 and 3). Each was constructed from 4synthetic oligonucleotides (78-138 nucleotides). The synthetic DNAs werearranged to have a 5′ HindIII sticky end and a 3′ BamHI sticky end,enabling them to be cloned between the corresponding sites in plasmidpUC 19 and the recombinant plasmid recovered in E. coli.

All other insulin precursor-encoding variants were constructed byderivation from Ins2, using two methods. Firstly, Ins2 was designed tocontain restriction endonuclease sites for SalI and NsiI, enabling invitro rearrangement, or more synthetic DNA to be inserted at either ofthese sites via in vitro ligation. Secondly, site-directed in vitromutagenesis methods using synthetic oligonucleotide primers were used toalter the encoding DNA (i.e. to change codons or delete DNA).

By using these methods a further eight variant insulinprecursor-encoding sequences were made, termed: Ins3, Ins4, ln5, Ins6,Ins7, Ins8, Ins9, Ins 10. All of the Ins variants contained the insulinB-chain and A-chain (human for Ins1 to 7, porcine for Ins8 and 9, andbovine for Ins10), but differred in the C-peptide region, as summarizedin Table 1.

C region variants contained either no sites for S. cerevisiaeendopeptidase KEX2, one site or two sites. The KEX2 sites were Lys-Arg,the dipeptide which the endopeptidase cleaves the most efficiently(Julius, D., et al., 1984, Cell 37: 1075-1089). Ins1 encodes themini-proinsulin of Thim, L., et al., 1986, supra, in which the B- and Achains are linked by Lys-Arg, but folding of the molecule preventscleavage by KEX2 Other C region variants encode the 9E 10 epitope, asmall c-myc peptide which is specifically recognized by a monoclonalantibody (Evan, G. I., et al., 1985, Mol. Cell. Biol. 5: 3610-3616),and/or a peptide sequence designated Pur, which binds with highspecificity to phosvitin or heparin for affinity purification (see forexample EP-A-91308382.0). C region variants of Ins2-10 contain between14 and 30 amino acid residues. In Ins2 and Ins5 the KEX2 site is betweenthe B-chain and C-peptide. In Ins4, Ins6, Ins7, Ins8, Ins9 and Ins10 theKEX2 site is between the C-peptide and the A chain. In Ins3 there aretwo KEX2 sites, one at each of the aforementioned positions. Ins2-16 islike Ins2 but lacks a KEX2 site.

2. Preparation of Gene and Plasmid Vector Constructs for Yeast

The synthetic insulin precursor encoding sequences were combined intoyeast genes in such a way as to provide fusion to a secretion leaderpeptide, a promoter for transcription and a terminator for efficienttranscription termination. They were incorporated into plasmid vectorsable to replicate and to be selected in yeast. All of these featureswere provided by plasmid vectors pDP314, pDP315 and pDP316—see FIG. 18(C.Hadfield, 1994, In: The Molecular Genetics of Yeast—A PracticalApproach, J. R. Johnston (ed), IRL Press, pp. 1748). These plasmidsenabled fuision to the Mfα1 secretion leader in three variations, asshown in FIG. 19. Fusion into pDP316 is to the pre peptide whichincorporates the secretion signal peptide. Cleavage by signal peptidaseat the pre-Ins junction liberates the Ins precursor with a matureB-chain amino-terminus. Fusion into pDP315 provides a pre-pro leader.Cleavage by KEX2 endopeptidase at the pro-Ins junction liberates the Insprecursor with a mature B-chain amino-terminus. Plasmid pDP314 providedfusion to pre-pro-EAEA (where E is Glu and A is Ala). Cleavage by KEX2endopeptidase liberates EAEA-Ins. Removal of EAEA from theamino-terminus of the Ins precursor by the STE13 exopeptidase is knownto be inefficient (Brake, A. J., et al., 1984, Proc. Natl. Acad. Sci.USA, 81: 4642-4646; Thim, L., et al., 1986, supra) and so only a smallproportion of the Ins products would be expected to be produced with themature B-chain amino-terminus.

These leader peptide fusion variants were made to investigate theireffect on yield and processing. Evidence indicates that presence of thepro sequence is beneficial to secreted yield in the case of smallpeptides (Hadfield, C., et al., 1993, Mycol. Res. 97: 897-944). On theother hand, presence of the pro sequence introduces an additional KEX2processing site, which may be disadvantageous when other KEX2 processingsites occur within the Ins precursor.

Recombinant plasmids were constructed as described (C.Hadfield, 1994,supra). They were recovered by transformation into E. coli and DNApreparations isolated.

3. Transformation of S. cerevisiae by the Recombinant Plasmids

The pDP vectors contain the ORI-STB region of 2 μm, facilitatingmulticopy replication and stable inheritance in yeast, and the LEU2gene, which enables the presence of the plasmid to be selected for inleu2-defective host strains. Plasmid DNA was transformed into yeast bythe lithium acetate procedure (Ito, H., et al., 1983, J. Bacteriol. 153:163-168) and transformants able to grow in minimal medium lackingleucine isolated.

A number of yeast strains were transformed. All were mating-type α toenable the MFα promoter on the plasmid to function. The stains were:

BF307-10 α leu2 trp1 his3 ura3 met4 ade4 ade6 arg4;

DBY746 α leu2 trp1 his3 ura3 can1;

JRY188 α ura3-52 leu2-3,-112 trp1 his4 rme sir_(ts);

PMY1 α leu2-3,-112 his4;

BJ1991 α ura3-52 leu2-3,-112 trp1 pep4-3 prb1-1122; and BF307- 10 kex2::URA3

These facilitated study of the effect of strain variation and defects inmajor vacuolar protease encoding genes and the KEX2 endoprotease.

4. Expression and Stability of Recombinant Insulin Genes in YeastStability of Ins Plasmids

Transformant yeast colonies were inoculated into 10 ml of minimal mediumlacking leucine (to maintain the selection for the plasmid) and shakenat 30° C. overnight. Cells from the resultant culture were used toinoculate YPD rich non-selective medium and selective minimal medium.After growth for at least 10 cell doublings, cells were plated todetermine the proportion of plasmid-containing cells. In both cases,over 99% of cells contained the plasmid, demonstrating a high degree ofmitotic stability and demonstrating that expression of the insulinprecursor products did not affect plasmid stability.

Relative Expression Yields

Studies were undertaken to compare the relative yields of insulinprecursor products obtained with the different types of constructs.Transformed yeast cells were grown under identical conditions inselective minimal medium or rich YPD medium and the amount of insulinmaterial secreted into the culture medium quantified using theBoehringer ELISA assay kit for insulin, which utilises monoclonalantibodies to both the A- and B-chains of human insulin.

Ins1, Ins2, Ins2-16, Ins3, Ins4 and Ins5 gave very similar yields. Theyield for Ins6 was 60% of their level and Ins7 30%, although these maybe due to reduced reactivity in the assay rather than less materialpresent. As both Ins1 and Ins2-16 are not cleaved internally by KEX2,the presence of single or double KEX2 sites within the other precursorsdid not result in reduced yield. These results indicate that precursorsinternally cleaved by KEX2 endonuclease in yeast may be produced withoutdetriment to yield which would have prevented their economic use forinsulin manufacture. Thus, Ins3, which has two internal KEX2 cleavagesites, was not recovered at low yield like other human proinsulinvariants expressed from similar construct in yeast (Thim, L. et al.,1986, supra) and so overcomes the inefficiency associated withproinsulin expression in yeast. The other two construct types, havingsingle KEX2 sites either at the junction with the A-chain (Ins2, Ins5)or the B-chain (Ins4, Ins6, Ins7), as indicated in Table 1, showed nosignificant effect on yield of having a KEX2 site in either of thesepositions.

Products expressed from pDP314 fusions appeared to form aggregates. Thiswas absent in constructs lacking the amino-terminal GluAlaGluAla peptideextension on the Ins product. Yields were reduced to 25% when fused tothe pre sequence alone in pDP3 16, indicating that absence of thepro-sequence reduced yield. Presence of the additional KEX2 site in thepro leader did not seem to affect yield.

Effect of Host Strain and Culture Temperature

The relative maximum yields of Ins products after growth in YPD mediumat 30° C. (18-30 hours) were analysed when different yeast host strainswere used. Slight but not substantial—variation between strains wasobserved and mutations in vacuolar protease genes did not influence theyield.

Lowering the culture temperature from 30° C. to 20° C. resulted in anapproximately ten-fold increase in yield. An intermediate level wasobtained at 25° C. Reduced temperature culture at 20° C., down from thetemperature of 30° C. at which S. cerevisiae is commonly incubated, istherefore an important part of the present invention.

Effect of Culture Medium

Insulin precursor product yields were up to 3-fold higher in rich YPDmedium than in semi-defined minimal medium. This reflected a differencein cell density rather than a difference in product yield per cell. YPDcontains many proteins which are disadvantageous for purification.However, semi-defined minimal medium supplemented with casamino acids,or malt extract medium supplemented with casamino acids, resulted incell densities and product yields almost as high as YPD medium.

5. Product Characterization

The insulin precursor products secreted by transformed yeasts werecharacterized by several means.

(a) ELISA assays on culture supernatants using the BoehringerEnzymum-Test Insulin assay kit (catalogue number 1 289 101) detectedinsulin material in the culture supernatants. The kit utilises twomonoclonal antibodies, which recognize B-chain and A-chain eptitopes ofhuman insulin. Further antibody assays were performed at the Departmentof Clinical Biochemistry, Addenbrookes Hospital, University ofCambridge, with a battery of individual monoclonal antibodiesrecognizing B-chain or A-chain epitopes. These all reacted with the Insmaterials confirming they carried human insulin epitopes.

(b) Culture supernatants were concentrated by a variety ofmethods—including ultrafiltration. ammonium sulphate precipitation andlyophilization—and samples analysed on both denaturing andnon-denaturing polyacrylamide gels for confirmation of the expectedmolecular weight. Such gels were also western blotted and probed withanti-insulin antibodies. The antibodies used were polyclonal antibodiesraised against human insulin in guinea pigs and commercially availablemonoclonal antibodies for human insulin raised in mouse. Positivereactions against the Ins products were obtained, confirming them tohave insulin epitopes.

(c) Loss of the ‘C’ region of Ins3 (which contains the 9E10 epitope) byendoproteolytic processing was confirmed by probing the western blotwith anti-9E10 monoclonal antibody (raised in mouse). In contrast, Ins2and Ins4, which also contain the ‘C’ region 9E10 epitope, but which arecleaved only once internally, were recognised by the anti-9E10 antibody.

(d) Chemical verification of the product was obtained by peptidesequencing.

5. Recovery and Purification Via a Specific Affinity Binding Peptidewithin the ‘C Region’.

Insulin precursor products (Ins5, Ins6, Ins7, Ins9 and Ins 10)containing the peptide sequence ‘Pur’ within the C region may bespecifically isolated from the cell-free culture supernatant by affinitycolumn binding (European patent application number 91308382.0). Bindingoccurs to heparin-sepharose or phosvitin-sepharose. As this systemspecifically targets an artificial peptide within the C region, itdiffers from the use of anti-insulin immunoadsorbtion columns, whichspecifically bind the B- and/or A-chains of insulin (see for exampleEP-A-0427296).

The Pur peptide is a consensus sequence, derived from studies (at VMSRF)of the interaction of a number of proteins with phosvitin and heparin.The Pur peptide provides predominantly basic charges that are requiredfor such interaction. In addition, it has been designed to avoidinternal cleavage by KEX2 endopeptidase, so that it remains intactduring passage through the yeast secretion pathway.

6. Production of Mature Insulin

(a) From Insulin Precursor Type I

A precursor of the type B-KR-X-KR-A directed into the yeast secretionpathway, where it folds and the disulphide bonds form, may be fullycleaved by KEX2 to yield the product A::B-KR, in which :: represents thetwo disulphide bonds established between the A- and B-chains.

Complete in vivo exonuclease modification by KEX1 will remove the KRresidues (Dmochowska, A., et al., 1987, Cell 50: 573-584) to producemature insulin, A::B.

Inefficient processing by KEX2 may result in single cleavage products(A::B-KR-X-KR or X-KR-A:: B-KR) or uncleaved product,

being secreted into the culture medium. To increase the yield, these maybe treated in vitro with trypsin to cleave at the KR residues (W.Kemmler et al., 1971, J. Biol. Chem. 246 6786-6791).

By similar token, inefficient in vivo processing by KEX1 may becompensated by in vitro treatment with carboxypeptidase B.

In the case of Ins3, an example of this kind of precursor, analysis ofthe secretion products from yeast indicates the presence of matureinsulin and some higher molecular weight precursors.

(b) From Precursor Types II. III. IV and V

Precursors having the general formula (II): B-KR-X-M-A are cleaved invivo by KEX2 and secreted by yeast in the form: X-M-A::B(-KR) where(-KR) indicates that these residues may be removed by KEX1 activity invivo or by carboxypeptidase B in vitro. Treatment with cyanogen bromidecleaves at the M residue to produce mature insulin.

The advantage of this kind of insulin derivation is most apparent withprecursors derived from the general formula (II), having the generalformula (III): B-KR-Pur-M-A and its derivative having the generalformula (V): B-KR-Y-N-Pur-M-A (V). These contain an affinitypurification sequence (Pur) within the ‘C region’, which enables theprecursor to be recovered, purified and concentrated from the culturesupernatant. Subsequent cleavage with cyanogen bromide removes thispurification sequence and it may be separated from the mature insulinproduct by passage through an affinity column, on which it will beretained.

(c) From Precursor Type VI

Precursors having the general formula (VI): B-N-X-KR-A are cleaved invivo by KEX2 and secreted by yeast in the form: A::B-N-X(-KR) where(-KR) indicates that these residues may be removed by KEX1 activity invivo or by carboxypeptidase B in vitro. If N is aspartic acid, in vitrotreatment with P. fragi mutant Me1 endopeptidease (Sigma Catalogue No.P3303 Me1 endopeptidase from mutant Pseudomonas fragi) cleavesamino-terminally to the N residue to produce mature insulin.

Derivatives containing an affinity purification sequence (Pur) within Xcan be recovered, purified and concentrated from the culture supematant.Subsequent cleavage at N removes this purification sequence and it maybe separated from the mature insulin product by passage through anaffinity column, on which it will be retained.

7. Second Products

Precursors having the general formula (V): B-KR-Y-N-Pur-M-A or itsderivatives, or analogous derivatives of formula (VI), incorporate asecond product within the ‘C region’. These are separated from theinsulin product by cyanogen bromide treatment as in 6(b) above.

EXPERIMENTAL

General molecular biology and recombinant DNA methods used throughoutthe experimental work are as described in J. Sambrook et al., 1989,Molecular cloning: A laboratory manual, 2nd edition, Cold Spring HarborLaboratory, New York; and G. J. Boulnois, 1987, Gene cloning andanalysis, a laboratory guide, Blackwell Scientific Publications, Oxford.

Example 1 Construction of Synthetic Recombinant Insulin PrecursorSequences

(a) Oligonucleotide Synthesis.

The synthetic insulin precursor encoding sequences were designed to useonly codons that are highly expressed in S. cerevisiae and beconstructed using chemically synthesised oligonucleotides. Sixoligonucleotides (incorporating A chain, B chain and modified C chain)were chemically synthesised on Applied Biosystem DNA synthesiser at theDepartment of Biochemistry and Pathology, University of Cambridge,Cambridge, England. The sequences and sizes were:

Oligo A1 (SEQ ID NO: 19; 92mer)

5′ CTA CAC TCC AAA GAC TAA GAG AGG TAT CGT TGA ACA ATG TTG TAC TTC TATCTG TTC TTT GTA CCA ATT GGA AAA CTA CTG TAA CTA ATA AG 3′

Oligo A2 (SEQ ID NO: 20: 80mer)

5′ GAT CCT TAT TAG TTA CAG TAG TTT TCC AAT TGG TAC AAA GAA CAG ATA GAAGTA CAA CAT TCT TCA ACG ATA CCT CTC TT 3′

Oligo B1 (SEQ ID NO: 21: 78mer)

5′ AGC TTT CGT TAA CCA ACA CTT GTG TGG TTC TCA CTT GGT TGA AGC CTT GTACTT GGT TTG TGG TGA AAG AGG TTT CTT 3′

Oligo B2 (SEQ ID NO: 22: 90mer)

5′ ACT CTT TGG AGT GTA GAA GAA ACC TCT TTC ACC ACA AAC CAA GTA CAA GGCTTC AAC CAA GTG AGA ACC ACA CAA GTC TTG GTT AAC GAA 3′

Oligo B3 (SEQ ID NO: 23: 134mer)

5′ CTA CAC TCC AAA GAC TAT GCA TGA ACA AAA GTT GAT CTC TGA AGA AGA CTTGGT CGA CAA GAG AGG TAT GCT TGA ACA ATG TTG TAC TTC TAT CTG TTC TTT GTACCA ATT GGA AAA CTA CTG TAA CTA ATA AG 3′

Oligo B4 (SEQ ID NO: 24: 122mer)

5′ GAT CCT TAT TAG TTA CAG TAG TTT TCC AAT TGG TAC AAA GAA CAG ATA GAAGTA CAA CAT TGT TCA ACG ATA CCT CTC TTG TCG ACC AAG TCT TCT TCA GAG ATCAAC TTT TGT TCA TGC AT 3′

The yields (1 unit A₂₆₀=33 μg/ml) of these oligonucleotides on 0.2 mmolescale were: Oligo A1 137 units, Oligo A2 89 units, Oligo B1 93 units,Oligo B2 113 units, Oligo B3 128 units and Oligo B4 108 units,respectively. Their integrity was checked by running the ³²Pend-labelled samples of the oligonucleotide preparations on denaturatingpolyacrylamide gels and visualising the size-separated products byautoradiography.

(b) Purification of Oligonucleotides and Phosphorylation

The oligonucleotides synthesised above were ethanol precipitated andpelleted in a microcentrifuge. The oligonucleotide pellets weresuspended in water (distilled and deionized) and then aliquotselectrophoresed in a preparative denturating polyacrylaminde gel. Thesize-separated DNA was visualized by UV shadowing and the top-most band(i.e. the largest sized oligonucleotide) in each case excised from thegel with a sterile scalpel blade. The DNA was eluted with elution buffer(0.5 M NH40AC, 1 mM EDTA [pH 8.0]) at room temperature (24° C.) for 16hours. The eluted DNA was again ethanol precipitated, dried andsuspended in water. The gel purified DNAs were phosphorylated (5′-OHend) using polynucleotide kinase.

(c) Construction of Ins1

1. Annealing of Oligonucleotides and Cloning of Duplex DNA

Purified oligonucleotides A1 and A2 were mixed together, each at aconcentration of 1 mg/ml in water, and sealed within a glass capillary.Purified oligonucleotides B1 and B2 were likewise mixed together. Onelitre of water was heated to 100° C. and the heat source turned off. Thetwo capillaries containing the oligonucleotides were immersed in theboiling water bath, which was then covered with aluminium foil andallowed to cool slowly overnight to room temperature.

The two glass capillaries were opened and the contents mixed together.The mixture was then immersed in an 80° C. water bath, which was allowedto cool to room temperature.

The annealed duplex DNA was cloned into pUC19 that had been cleaved withHindIII and BamH1. The ligation reaction not only joined the syntheticDNA into the vector, but also simultaneously joined the backbones ofA1-A2 to B1-B2 (see FIG. 1).

2. Recovery of Clones and Initial Analysis

The ligated DNA was transformed into competent cells of E. coli strainNM522 (supE thi (lac-proAB)^(Δ)hsd^(Δ)5(rk³¹ mk⁺), [F′ proABlacl^(Q)Z^(Δ)M15]) and plated onto L agar containing ampicillin (50μg/ml) and X-gal (25 μg/ml).

White recombinant colonies (non-recombinants blue) were picked andcultured in L broth containing ampicillin at 37° C. for 16 hours.Plasmid DNA was then extracted using the alkaline lysis method (Birnboimand Doly).

The plasmid DNAs were restricted with HindIII and BamHI and the presenceof a 170 bp band (cloned synthetic DNA) looked for by gelelectrophoresis.

3. Purified Plasmid DNA Preparation and DNA Sequence Analysis

Purified plasmid preparations were made using triton lysis followed byethidium bromide/caesium chloride density gradient ultracentrifugation.The plasmid DNA was quantified (A₂₆₀) and re-analysed by gelelectrophoresis.

The cloned synthetic DNAs were sequenced using the T7 sequencing kitfrom Pharmacia and employed the M13 mp universal primers. The sequenceshown in FIG. 2 was verified.

(d) Construction of Ins2

1. Annealing of Oligonucleotides and Cloning of Duplex DNA

Purified oligonucleotides A3 and A4 were mixed together, each at aconcentration of 1 mg/ml in water, and sealed within a glass capillary.Purified oligonucleotides B1 and B2 were likewise mixed together. Onelitre of water was heated to 100° C. and the heat source turned off. Thetwo capillaries containing the oligonucleotides were immersed in theboiling water bath, which was then covered with aluminium foil andallowed to cool slowly overnight to room temperature.

The two glass capillaries were opened and the contents mixed together.The mixture was then immersed in a 80° C. water bath, which was allowedto cool to room temperature.

The annealed duplex DNA was cloned into pUC19 that had been cleaved withHindIII and BamHI. The ligation reaction not only joined the syntheticDNA into the vector, but also simultaneously joined the backbones ofA3--A4 to B1-B2 (see FIG. 3).

2. Recovery of Clones and Initial Analysis

The ligated DNA was transformed into competent cells of E. coli strainNM522 (supE thi (lac-proAB)^(Δ)hsd^(Δ)5(rk⁻mk⁺), [F′ proABlacl^(Q)Z^(Δ)M15])D and plated onto L agar containing ampicillin (50μg/ml) and X-gal (25 μg/ml).

White recombinant colonies (non-recombinants blue) were picked andcultured in L broth containing ampicillin at 37° C. for 16 hours.Plasmid DNA was then extracted using the alkaline lysis method (Bimboimand Doly).

The plasmid DNAs were restricted with HindIII and BamH1 and the presenceof a 212 bp band (cloned synthetic DNA) looked for by gelelectrophoresis. Clones showing such a band were further analysed onrestriction digestion gels to verify the presence of NsiI and SalI siteswithin the cloned DNA.

3. Purified Plasmid DNA Preparation and DNA Sequence Analysis

Purified plasmid preparations were made using triton lysis followed by,ethidium bromide/caesium chloride density gradient ultracentrifugation.The plasmid DNA was quantified (A₂₆₀) and re-analysed by gelelectrophoresis.

The cloned synthetic DNAs were sequenced using the T7 sequencing kitfrom Pharmacia and employed the M13 mp universal primers.

4. Final Derivation of Ins2 Via In Vitro Recombination

All of the synthetic cloned DNAs analysed by sequencing containedmutations of the desired sequence. In order to obtain an Ins2 clone ofthe required sequence, two defective versions—Ins2-16 and Ins2-18—wereused to create an in vitro recombined correct version, Ins2-16/18. Thiswas possible because the defects in clones 16 and 18 were on oppositesides of the central SalI site, as indicated in the FIG. 4.

The HindIII-SalI fragment from Ins2-16, and the SalI-BamHI fragment fromIns2-18, were isolated from preparative polyacrylamide gels and ligatedtogether into pUC19 to create Ins2-16/18. The authenticity of this clonewas verified by sequencing (see FIG. 5).

(e) Construction of Ins3

Ins3 was derived from Ins2 using PCR mutagenesis, the strategy for whichis shown in FIG. 6. The primers used to change MetHis codons to LysArgare shown in FIG. 7. Clones obtained were sequenced to confirm theywould encode the product shown in FIG. 8 when expressed in yeast.

(f) Construction of Ins4

Ins4 was derived from Ins3 using the PCR mutagenesis strategy describedin FIG. 6. The primers used to change an Arg codon to Met are shown inFIG. 7. Clones obtained were sequenced to confirm they would encode theproduct shown in FIG. 9 when expressed in yeast.

(g) Construction of Ins5

Ins5 was created by inserting a synthetic purification sequence (Pur)into the SalI site of Ins2 (FIG. 10). The Pur sequence—as shown in FIG.10—was created by annealing two synthetic oligonucleotides (prepared asdescribed for Ins1 and Ins2 construction above). Clones obtained weresequenced to confirm they would encode the product shown in FIG. 11 whenexpressed in yeast.

(h) Construction of Ins6

Ins6 was created by inserting the synthetic purification sequence, Pur,into the SalI site of Ins4 (FIG. 12). Clones obtained were sequenced toconfirm they would encode the product shown in FIG. 13 when expressed inyeast.

(i) Construction of Ins7

Ins7 was derived from Ins6 by PCR deletion using the general strategy ofFIG. 6 as shown in FIG. 14. Clones obtained were verified by sequencing(FIG. 15).

(j) Construction of Ins8

Ins8 was derived from Ins4 by PCR mutagenesis using the general strategyof FIG. 6 as shown in FIG. 16. Clones obtained were sequenced to verifythe change of residue B₃₀.

(k) Construction of Ins9

Ins9 was derived from Ins6 by PCR mutagenesis using the general strategyof FIG. 6 as shown in FIG. 16. Clones obtained were sequenced to verifythe change of residue B₃₀.

(1) Construction of Ins10

Ins10 was derived from Ins9 by PCR mutagenesis using the generalstrategy of FIG. 6 as shown in FIG. 17. Clones obtained were sequencedto verify the change of residues A₈ and A₁₀.

Example 2 Construction of MFα1—Ins Fusion Shuttle Vector Plasmids

pUC19-Ins plasmid DNA (where Ins represents all Ins precursor geneconstructs, Ins1 to Ins10) was cleaved with HindIII and BamHI toliberate the Ins fragment DNA. The DNA was size separated byelectrophoresis on a 3.5% polyacrylamide gel and stained with ethidiumbromide. The Ins DNA fragment was excised under ultraviolet illuminationand the DNA eluted and concentrated by ethanol precipitation.

pDP314, pDP315 and pDP316 DNA were cleaved with HindlIl and BamHI,phenol-chloroform extracted and then ethanol precipitated and washedwith 70% ethanol (Boulnois et al). The pellet was resuspended in 400 μlof TE buffer, ethanol precipitated, washed with 70% ethanol and driedunder vacuum.

Ins fragment and pDP vector DNAs were resuspended in 5 mM Tris-HC1 (pH7.5). To make in vitro recombinant plasmid clones, 1.0 μg of pDP DNA wasmixed on ice with 20 ng of Ins fragment DNA, ligation buffer and T4 DNAligase, in a total volume of 25 μ1, and incubated overnight at 15° C.The ligated DNA was transformed into E. coli and ampicillin-resistantcolonies selected on L agar containing 50 μg/ml ampicillin.

Colonies were picked into L broth containing 50 μml ampicillin andincubated overnight at 37° C. in a shaking incubator. Plasmid DNA wasextracted and analysed for the presence of the correctly sized pDPvector and Ins insert fragments following digestion with HindIII andBamHI. In the cases of Ins2-10, their presence could be detected byintroduction of SalI or NsiI sites.

Insertion of the HindIII-BamHI Ins fragments into pDP314 resulted increation of the required fusion with the MFα pre-pro-EAEA leader (asshown in FIG. 19). In the cases of pDP315 and pDP316 furthermodifications had to be made to generate the required fusions. In thecase of pDP315-Ins the first step was cleavage of 2.5 μg of the DNA withHindIII and StuI, followed by phenol-chloroform extraction, ethanolprecipitation, washing with 70% ethanol and vacuum drying. The 5′single-stranded projection at the HindIII end was removed by treatmentwith mung bean exonuclease (Pharmnacia). Afterwards, the enzyme wasdenatured by phenol-chloroform extraction, followed by ethanolprecipitation, washing with 70% ethanol and vacuum drying. The DNA wasresuspended in 5 mM Tris-HCI (pH 7.5) and recircularized by ligation.This process removed the EAEA dipeptides, creating a pre-pro-Ins fusion(as shown in FIG. 19).

In the case of pDP316-Ins the first step was cleavage of 2.5 μg of theDNA with HindIII and SphI, followed by phenol-chloroform extraction,ethanol precipitation, washing with 70% ethanol and vacuum drying. The5′ single-stranded projection at the HindIII end and the 3′ projectionat the SphI end were removed by treatment with mung bean exonuclease(Pharmacia). Afterwards the enzyme was denatured by phenol chloroformextraction, followed by ethanol precipitation, washing with 70% ethanoland vacuum drying. The DNA was resuspended in 5 mM Tris-HCI (pH 7.5) andrecircularized by ligation. This process removed the pro-peptide andEAEA dipeptides, creating a pre-Ins fusion (as shown in FIG. 19).

All fusions were sequence verified using the following primers:

314/315 (-40) AAA TAC TAC TAT TGC CAG C (SEQ ID NO: 25) or 316 (-64) CATACA CAA TAT AAA CGA CC (SEQ ID NO: 26) with ADH reverse (-44) CAA GGTAGA CAA GCC GAC (SEQ ID NO: 27)

Example 3 Expression in Yeast

Insulin precursor products secreted in the culture medium bytransformant cells carrying the different synthetic gene constructionswere quantified using the Boehringer ELISA assay kit for insulin, whichutilises monoclonal antibodies to both the A- and B-chains of humaninsulin.

Shake-flask cultures were inoculated from a fresh ovemight minimalselective culture at 1:1000 dilution and incubated at the prescribedtemperature with continuous shaking. Samples were withdrawn periodicallyfor analysis: cell growth was monitored by increase in optical densityat 600 nm and insulin yield by ELISA assay. Stationary phase in cellgrowth was reached after 24-36 hours, depending upon medium, and thiscoincided with peak insulin yield.

(a) Yields in YPD

Culture at 30° C. of BF307-10 transformants resulted in yields ofapproximately 2.0 mg/l for Ins1, Ins2, Ins2-16, Ins3, Ins4 and Ins5fused to the pre-pro leader. Yields calculated by ELISA assay for Ins6were 1.6 mg/l and 0.8 mg/l for Ins7, although gel analysis suggestedthis reflected relative reactivity towards the Ins products rather thanless material. Products expressed from pDP314 fusions appeared to formaggregates. This was absent in constructs lacking the amino-terminalGluAlaGluAla peptide extension on the Ins product. Yields were reducedto 25% of the above when fused to the pre sequence alone in pDP316,indicating that absence of the pro sequence reduced yield. Presence ofthe additional KEX2 site in the pro leader did not seem to affect yield.

As Ins 1 and Ins2-16, which were not subject to internal KEX2proteolytic cleavage, showed similar yields to the other constructforms, the presence of KEX2 cleavage sites within the Ins region did notreduce yield. Thus, Ins3, which has two internal KEX2 cleavage sites,was not recovered at low yield like human proinsulin expressed from asimilar construction in yeast (Thim et al., 1986, Proc. Natl. Acad. Sci.USA, 83: 6766-6770) and so overcomes the inefficiency associated withproinsulin expression in yeast.

The other two construct types, having single KEX2 sites either at thejunction with the A-chain (Ins2, Ins5) or the B-chain (Ins4, Ins6,Ins7), as indicated in Table 1, showed no effect on yield of having aKEX2 site in either of these positions.

(b) Effect of Host Strain

The relative maximum yields of Ins products after growth in YPD mediumat 30° C. (18 30 hours) were analysed the cells were removed and ELISAassays for insulin.

A₄₂₀ nm* Strain Ins7 Ins6 Ins1 BF307-10 1.75 PMY1 1.14 1.33 DBY746 0.58BJ1991 0.87 0.90 TGY47.1 1.47 *Unconverted data from ELISA assay.

Thus, there is a slight effect of yeast host strain on yield, but not asubstantial one.

(c) Effect of Culture Temperature on Vield

Lowering the culture temperature from 30° C. to 20° C. resulted in ayield of 15-20 mg/l in YPD—a ten-fold increase over the yield at 30° C.An intermediate level was obtained at 25° C.

(d) Yields in Other Media

PMY1 transformant cells were inoculated into different media (allcontaining 2% glucose) from a minimal selective culture, as described,and shaken at 30° C. for 24 hours. Cell growth was measured byabsorbance at 600 nm and the amount of insulin product in the culturemedium determined (A420 nm being the unconverted ELISA data). The datashown are for the Ins7 and Ins6 products.

Ins7 Ins6 Ins Ins Cell A₆₀₀ Ins A₄₂₀ cell Cell A₆₀₀ Ins A₄₂₀ cell YPD7.10 1.03 0.14 7.32 1.22 0.17 ME 4.80 0.47 0.10 4.70 1.00 0.21 ME + CA6.18 0.72 0.12 7.06 0.99 0.14 SD 2.40 0.39 0.16 2.10 0.90 0.43 SD + CA6.63 0.83 0.12 5.50 1.12 0.20 ME = 2% malt extract; CA = 1% casaminoacids; SD = semi-defined minimal medium

The results show that although cell growth varied quite widely betweendifferent media, insulin product yield per unit cell OD remained fairlyconstant.

Example 4 Characterization of the Ins3 Product

A sample of the Ins3 product was concentrated by ultrafiltration, usingthe Millipore minitan system with a filter having a 3,000 Daltonmolecular weight cut-off. The retentate was then passed through a 30,000Dalton molecular weight cut-off Centricon filter to remove proteins of≧30,000 Daltons. The sample was further purified by gel filtration on aSephadex GSOM column. Peak fractions containing insulin activity werethen run on a preparative SDS-polyacrylamide gel. After blotting onto amembrane, the slower-running band was peptide sequenced by the LeicesterUniversity peptide sequencing service. The product obtained had the samepeptide sequence as human insulin B-chain (Table 2). When pDP314 wasused as the vector the major product contained GluAlaGluAla at theamino-terminus, showing failure to be removed by the STE13 exopeptidase.As anticipated from the work of A. J. Brake et al., 1984, supra andThim, L., et al., 1986, spra, the pDP314 product is unsuitable forpharmaceutical use. However, products in which the amino-terminus isderived by endopeptidase cleavage, such as from the pDP315 and pDP316,provide the correct aminoterminus for the B-chain and are thereforesuitable for pharmaceutical use.

Example 5 Recovery and Purification Via a Specific Affinity BindingPeptide within the ‘C Region’

The Ins5, Ins6, Ins7, Ins9 and Ins10 products contain a peptide sequence‘Pur’ within the C region that enables affinity purification (see forexample EP-A-91308382.0).

(a) Column Preparation

40 ml of heparin-sepharose or phosvitin-sepharose gel was packed onto asintered glass column (60 mm diam.×100 mm length). The gel bed thicknesswas 12-16 mm.

(b) Affinity Binding and Recovery

BJ 1991 (pDP314-Ins2), BJ1991 (pDP3 14-lns6) and BJ1991 (pDP314-Ins7)transformant cells were cultured in YPD medium for 24 hours at 30° C.100 ml of culture supematant was adjusted to pH 7.5. Heparin-sepharosecolumns were equilibrated with 20 mM TrisHCI pH 7.5 and the supematantsloaded

Proportion of Ins material binding column 1 column 2 Elution of pass 1pass 2 pass 3 bound material Ins2 none none none Ins6 50% 12% none 100%Ins7 40% 10% none 100%

Only Ins material containing the Pur sequence (Ins6 and Ins7, not Ins2)binds to the column. The Pur sequence therefore facilitatespurification.

Reloading of the breakthrough of the first pass through the columnresults in more material binding. However, loading of the breakthroughof the second pass onto a fresh column did not result in more binding.Thus, 40-50% of the material failed to bind under these conditions.

Elution with 20 mM Tris-HCI pH 7.5, 1.0 M NaCl resulted in recovery ofall the material that had bound to the column.

Similar results were obtained with phosvitin-sepharose as the columnmaterial.

Improved binding efficiency was obtained using altered binding bufferconditions. The heparin-sepharose column was equilibrated with 20 columnvolumes of sodium acetate buffer pH 5.0. The Ins7 culture supernantantwas adjusted to pH 4.5-5.0 with glacial acetic acid and loaded onto thecolumn under gravity. The column was washed with 20 mM sodium phosphatebuffer pH 6.25 after loading to recover unbound material.

A₄₂₀ units % of input Input activity (from 1.21 culture) 1,018,200 100%Passed through column in loading 1 60,000 6% Passed through column inloading 2 20,000 2% Totalbound 998,200 98%

Elution with 250 ml of 20 mM sodium phosphate buffer pH 6.25, 0.5 M NaClresulted in recovery of all of the bound material.

The 2% of material that failed to bind to the column was aggregatedmaterial associated with the pDP314 fusion construction.

TABLE 1 RECOMBINANT INSULIN PRECURSOR VARIANTS Tag Construct StructureChains Extension Processing Purification Ins1 B--KR--A 1 Ins2B--M-9E10-KR--A 2 (B-chain)-C + Ins2-16 B--M-9E10-R--A 1 + Ins3B--KR-9E10-KR--A 2 + Ins4 B--KR-9E10-KM--A 2 N-(A-chain) + Ins5B--MH-9E10-Pur-KR--A 2 (B-chain)-C + + Ins6 B--KR-9E10-Pur-KM--A 2N-(A-chain) + + Ins7 B--KR-Pur-KM--A 2 N-(A-chain) + Ins8B(Ala³⁰)--KR-9E10-KM--A 2 N-(A-chain) + Ins9 B(Ala³⁰)--KR-9E10-Pur-KM--A2 N-(A-chain) + + Ins10 B(Ala³⁰)--KR-9E10-Pur-KM--A(Ala⁸, Ile¹⁰) 2N-(A-chain) + + Human: B--/--A Porcine: B(Ala³⁰)--/--A Bovine:B(Ala³⁰)--/--A(Ala⁸, Ile¹⁰)

TABLE 2 Results of N-terminal sequencing of B-chain Cycle No. ResidueAmount (pmoles) 1 Phe 338.57 2 Ala 390.76 3 Val 251.60 4 Asn 234.58 5Gln 209.20 6 His 155.35 7 Leu 172.88 8 ? (Cys) 9 Gly 91.06 10 Ser 35.2011 His 37.50 12 Leu 59.03 13 Val 46.85 14 Glu 31.39 15 Ala 36.20 16 Leu14.63 17 Tyr 32.81 18 Leu 22.19 19 Val 7.92

43 67 amino acids amino acid unknown 1 Phe Val Asn Gln His Leu Cys GlySer His Leu Val Glu Ala Leu Tyr 1 5 10 15 Leu Val Cys Gly Glu Arg GlyPhe Phe Tyr Thr Pro Lys Thr Lys Arg 20 25 30 Glu Gln Lys Leu Ile Ser GluGlu Asp Leu Val Asp Lys Arg Gly Ile 35 40 45 Val Glu Gln Cys Cys Thr SerIle Cys Ser Leu Tyr Gln Leu Glu Asn 50 55 60 Tyr Cys Asn 65 67 aminoacids amino acid unknown 2 Phe Val Asn Gln His Leu Cys Gly Ser His LeuVal Glu Ala Leu Tyr 1 5 10 15 Leu Val Cys Gly Glu Arg Gly Phe Phe TyrThr Pro Lys Thr Lys Arg 20 25 30 Glu Gln Lys Leu Ile Ser Glu Glu Asp LeuVal Asp Lys Met Gly Ile 35 40 45 Val Glu Gln Cys Cys Thr Ser Ile Cys SerLeu Tyr Gln Leu Glu Asn 50 55 60 Tyr Cys Asn 65 83 amino acids aminoacid unknown 3 Phe Val Asn Gln His Leu Cys Gly Ser His Leu Val Glu AlaLeu Tyr 1 5 10 15 Leu Val Cys Gly Glu Arg Gly Phe Phe Tyr Thr Pro LysThr Lys Arg 20 25 30 Glu Gln Lys Leu Ile Ser Glu Glu Asp Leu Val Asp MetHis Gly Leu 35 40 45 Arg Ala Arg Asn Arg Ser Lys Thr Gly Pro Val Asp LysMet Gly Ile 50 55 60 Val Glu Gln Cys Cys Thr Ser Ile Cys Ser Leu Tyr GlnLeu Glu Asn 65 70 75 80 Tyr Cys Asn 69 amino acids amino acid unknown 4Phe Val Asn Gln His Leu Cys Gly Ser His Leu Val Glu Ala Leu Tyr 1 5 1015 Leu Val Cys Gly Glu Arg Gly Phe Phe Tyr Thr Pro Lys Thr Lys Arg 20 2530 Gly Leu Arg Ala Arg Asn Arg Ser Lys Thr Gly Pro Val Asp Lys Met 35 4045 Gly Ile Val Glu Gln Cys Cys Thr Ser Ile Cys Ser Leu Tyr Gln Leu 50 5560 Glu Asn Tyr Cys Asn 65 67 amino acids amino acid unknown 5 Phe ValAsn Gln His Leu Cys Gly Ser His Leu Val Glu Ala Leu Tyr 1 5 10 15 LeuVal Cys Gly Glu Arg Gly Phe Phe Tyr Thr Pro Lys Thr Met His 20 25 30 GluGln Lys Leu Ile Ser Glu Glu Asp Leu Val Asp Lys Arg Gly Ile 35 40 45 ValGlu Gln Cys Cys Thr Ser Ile Cys Ser Leu Tyr Gln Leu Glu Asn 50 55 60 TyrCys Asn 65 83 amino acids amino acid unknown 6 Phe Val Asn Gln His LeuCys Gly Ser His Leu Val Glu Ala Leu Tyr 1 5 10 15 Leu Val Cys Gly GluArg Gly Phe Phe Tyr Thr Pro Lys Thr Met His 20 25 30 Glu Gln Lys Leu IleSer Glu Glu Asp Leu Val Asp Met His Gly Leu 35 40 45 Arg Ala Arg Asn ArgSer Lys Thr Gly Pro Val Asp Lys Arg Gly Ile 50 55 60 Val Glu Gln Cys CysThr Ser Ile Cys Ser Leu Tyr Gln Leu Glu Asn 65 70 75 80 Tyr Cys Asn 67amino acids amino acid unknown 7 Phe Val Asn Gln His Leu Cys Gly Ser HisLeu Val Glu Ala Leu Tyr 1 5 10 15 Leu Val Cys Gly Glu Arg Gly Phe PheTyr Thr Pro Lys Ala Lys Arg 20 25 30 Glu Gln Lys Leu Ile Ser Glu Glu AspLeu Val Asp Lys Met Gly Ile 35 40 45 Val Glu Gln Cys Cys Thr Ser Ile CysSer Leu Tyr Gln Leu Glu Asn 50 55 60 Tyr Cys Asn 65 83 amino acids aminoacid unknown 8 Phe Val Asn Gln His Leu Cys Gly Ser His Leu Val Glu AlaLeu Tyr 1 5 10 15 Leu Val Cys Gly Glu Arg Gly Phe Phe Tyr Thr Pro LysAla Lys Arg 20 25 30 Glu Gln Lys Leu Ile Ser Glu Glu Asp Leu Val Asp MetHis Gly Leu 35 40 45 Arg Ala Arg Asn Arg Ser Lys Thr Gly Pro Val Asp LysMet Gly Ile 50 55 60 Val Glu Gln Cys Cys Thr Ser Ile Cys Ser Leu Tyr GlnLeu Glu Asn 65 70 75 80 Tyr Cys Asn 83 amino acids amino acid unknown 9Phe Val Asn Gln His Leu Cys Gly Ser His Leu Val Glu Ala Leu Tyr 1 5 1015 Leu Val Cys Gly Glu Arg Gly Phe Phe Tyr Thr Pro Lys Ala Lys Arg 20 2530 Glu Gln Lys Leu Ile Ser Glu Glu Asp Leu Val Asp Met His Gly Leu 35 4045 Arg Ala Arg Asn Arg Ser Lys Thr Gly Pro Val Asp Lys Met Gly Ile 50 5560 Val Glu Gln Cys Cys Ala Ser Val Cys Ser Leu Tyr Gln Leu Glu Asn 65 7075 80 Tyr Cys Asn 204 base pairs nucleic acid double linear 10TTCGTTAACC AACACTTGTG TGGTTCTCAC TTGGTTGAAG CCTTGTACTT GGTTTGTGGT 60GAAAGAGGTT TCTTCTACAC TCCAAAGACT ATGCATGAAC AAAAGTTGAT CTCTGAAGAA 120GACTTGGTCG ACAAGAGAGG TATCGTTGAA CAATGTTGTA CTTCTATCTG TTCTTTGTAC 180CAATTGGAAA ACTACTGTAA CTAA 204 204 base pairs nucleic acid double linear11 TTCGTTAACC AACACTTGTG TGGTTCTCAC TTGGTTGAAG CCTTGTACTT GGTTTGTGGT 60GAAAGAGGTT TCTTCTACAC TCCAAAGACT AAGAGAGAAC AAAAGTTGAT CTCTGAAGAA 120GACTTGGTCG ACAAGAGAGG TATCGTTGAA CAATGTTGTA CTTCTATCTG TTCTTTGTAC 180CAATTGGAAA ACTACTGTAA CTAA 204 204 base pairs nucleic acid double linear12 TTCGTTAACC AACACTTGTG TGGTTCTCAC TTGGTTGAAG CCTTGTACTT GGTTTGTGGT 60GAAAGAGGTT TCTTCTACAC TCCAAAGACT AAGAGAGAAC AAAAGTTGAT CTCTGAAGAA 120GACTTGGTCG ACAAGATGGG TATCGTTGAA CAATGTTGTA CTTCTATCTG TTCTTTGTAC 180CAATTGGAAA ACTACTGTAA CTAA 204 252 base pairs nucleic acid double linear13 TTCGTTAACC AACACTTGTG TGGTTCTCAC TTGGTTGAAG CCTTGTACTT GGTTTGTGGT 60GAAAGAGGTT TCTTCTACAC TCCAAAGACT ATGCATGAAC AAAAGTTGAT CTCTGAAGAA 120GACTTGGTCG ACATGCATGG TTTGAGAGCT AGAAACAGAT CTAAGACCGG TCCAGTCGAC 180AAGAGAGGTA TCGTTGAACA ATGTTGTACT TCTATCTGTT CTTTGTACCA ATTGGAAAAC 240TACTGTAACT AA 252 252 base pairs nucleic acid double linear 14TTCGTTAACC AACACTTGTG TGGTTCTCAC TTGGTTGAAG CCTTGTACTT GGTTTGTGGT 60GAAAGAGGTT TCTTCTACAC TCCAAAGACT AAGAGAGAAC AAAAGTTGAT CTCTGAAGAA 120GACTTGGTCG ACATGCATGG TTTGAGAGCT AGAAACAGAT CTAAGACCGG TCCAGTCGAC 180AAGATGGGTA TCGTTGAACA ATGTTGTACT TCTATCTGTT CTTTGTACCA ATTGGAAAAC 240TACTGTAACT AA 252 210 base pairs nucleic acid double linear 15TTCGTTAACC AACACTTGTG TGGTTCTCAC TTGGTTGAAG CCTTGTACTT GGTTTGTGGT 60GAAAGAGGTT TCTTCTACAC TCCAAAGACT AAGAGAGGTT TGAGAGCTAG AAACAGATCT 120AAGACCGGTC CAGTCGACAA GATGGGTATC GTTGAACAAT GTTGTACTTC TATCTGTTCT 180TTGTACCAAT TGGAAAACTA CTGTAACTAA 210 204 base pairs nucleic acid doublelinear 16 TTCGTTAACC AACACTTGTG TGGTTCTCAC TTGGTTGAAG CCTTGTACTTGGTTTGTGGT 60 GAAAGAGGTT TCTTCTACAC TCCAAAGGCC AAGAGAGAAC AAAAGTTGATCTCTGAAGAA 120 GACTTGGTCG ACAAGATGGG TATCGTTGAA CAATGTTGTA CTTCTATCTGTTCTTTGTAC 180 CAATTGGAAA ACTACTGTAA CTAA 204 252 base pairs nucleicacid double linear 17 TTCGTTAACC AACACTTGTG TGGTTCTCAC TTGGTTGAAGCCTTGTACTT GGTTTGTGGT 60 GAAAGAGGTT TCTTCTACAC TCCAAAGGCC AAGAGAGAACAAAAGTTGAT CTCTGAAGAA 120 GACTTGGTCG ACATGCATGG TTTGAGAGCT AGAAACAGATCTAAGACCGG TCCAGTCGAC 180 AAGATGGGTA TCGTTGAACA ATGTTGTACT TCTATCTGTTCTTTGTACCA ATTGGAAAAC 240 TACTGTAACT AA 252 252 base pairs nucleic aciddouble linear 18 TTCGTTAACC AACACTTGTG TGGTTCTCAC TTGGTTGAAG CCTTGTACTTGGTTTGTGGT 60 GAAAGAGGTT TCTTCTACAC TCCAAAGGCC AAGAGAGAAC AAAAGTTGATCTCTGAAGAA 120 GACTTGGTCG ACATGCATGG TTTGAGAGCT AGAAACAGAT CTAAGACCGGTCCAGTCGAC 180 AAGATGGGTA TCGTTGAACA ATGTTGTGCT TCTGTTTGTT CTTTGTACCAATTGGAAAAC 240 TACTGTAACT AA 252 92 base pairs nucleic acid singlelinear 19 CTACACTCCA AAGACTAAGA GAGGTATCGT TGAACAATGT TGTACTTCTATCTGTTCTTT 60 GTACCAATTG GAAAACTACT GTAACTAATA AG 92 80 base pairsnucleic acid single linear 20 GATCCTTATT AGTTACAGTA GTTTTCCAATTGGTACAAAG AACAGATAGA AGTACAACAT 60 TCTTCAACGA TACCTCTCTT 80 78 basepairs nucleic acid single linear 21 AGCTTTCGTT AACCAACACT TGTGTGGTTCTCACTTGGTT GAAGCCTTGT ACTTGGTTTG 60 TGGTGAAAGA GGTTTCTT 78 90 base pairsnucleic acid single linear 22 ACTCTTTGGA GTGTAGAAGA AACCTCTTTCACCACAAACC AAGTACAAGG CTTCAACCAA 60 GTGAGAACCA CACAAGTCTT GGTTAACGAA 90134 base pairs nucleic acid single linear 23 CTACACTCCA AAGACTATGCATGAACAAAA GTTGATCTCT GAAGAAGACT TGGTCGACAA 60 GAGAGGTATG CTTGAACAATGTTGTACTTC TATCTGTTCT TTGTACCAAT TGGAAAACTA 120 CTGTAACTAA TAAG 134 122base pairs nucleic acid single linear 24 GATCCTTATT AGTTACAGTAGTTTTCCAAT TGGTACAAAG AACAGATAGA AGTACAACAT 60 TGTTCAACGA TACCTCTCTTGTCGACCAAG TCTTCTTCAG AGATCAACTT TTGTTCATGC 120 AT 122 19 base pairsnucleic acid single linear 25 AAATACTACT ATTGCCAGC 19 20 base pairsnucleic acid single linear 26 CATACACAAT ATAAACGACC 20 18 base pairsnucleic acid single linear 27 CAAGGTAGAC AAGCCGAC 18 28 base pairsnucleic acid single linear 28 ACTAAGAGAG AACAAAAGTT GATCTCTG 28 28 basepairs nucleic acid single linear 29 TTCTCTCTTA GTCTTTGGAG TGTAGAAG 28 24base pairs nucleic acid single linear 30 GACAAGATGG GTATCGTTGA ACAA 2424 base pairs nucleic acid single linear 31 ACCCATCTTG TCGACCAAGT CTTC24 24 base pairs nucleic acid single linear 32 AAGAGAGGTT TGAGAGCTAGAAAC 24 24 base pairs nucleic acid single linear 33 CAAACCTCTCTTAGTCTTTG GAGT 24 27 base pairs nucleic acid single linear 34AAGGCCAAGA GAGAACAAAA GTTGATC 27 28 base pairs nucleic acid singlelinear 35 CTTGGCCTTT GGAGTGTAGA AGAAACCT 28 29 base pairs nucleic acidsingle linear 36 GTTGTGCTTC TGTTTGTTCT TTGTACCAA 29 30 base pairsnucleic acid single linear 37 GAACAAACAG AAGCACAACA TTGTTCAACG 30 4amino acids amino acid unknown 38 Glu Ala Glu Ala 1 13 base pairsnucleic acid double linear 39 GAGGCTGAAG CTT 13 7 amino acids amino acidunknown 40 Lys Arg Glu Ala Glu Ala Phe 1 5 21 base pairs nucleic aciddouble circular 41 AAGAGAGAGG CTGAAGCTTT C 21 53 amino acids amino acidunknown 42 Phe Val Asn Gln His Leu Cys Gly Ser His Leu Val Glu Ala LeuTyr 1 5 10 15 Leu Val Cys Gly Glu Arg Gly Phe Phe Tyr Thr Pro Lys ThrLys Arg 20 25 30 Gly Ile Val Glu Gln Cys Cys Thr Ser Ile Cys Ser Leu TyrGln Leu 35 40 45 Glu Asn Tyr Cys Asn 50 162 base pairs nucleic aciddouble linear 43 TTCGTTAACC AACACTTGTG TGGTTCTCAC TTGGTTGAAG CCTTGTACTTGGTTTGTGGT 60 GAAAGAGGTT TCTTCTACAC TCCAAAGACT AAGAGAGGTA TCGTTGAACAATGTTGTACT 120 TCTATCTGTT CTTTGTACCA ATTGGAAAAC TACTGTAACT AA 162

What is claimed is:
 1. A DNA sequence encoding a protein precursor ofinsulin, wherein the precursor has a formula selected from any one ofthe group of SEQ ID NOs: 1 to
 6. 2. A DNA sequence encoding a proteinprecursor of insulin, wherein the precursor has a formula selected fromany one of the group of SEQ ID NOs: 7, 8 and
 9. 3. A DNA sequenceencoding a single-chain protein precursor wherein the precursor has aformula selected from the group consisting of SEQ ID NOs 1 to
 9. 4. ADNA sequence selected from the group consisting of SEQ ID NOs: 10 to 18,or a DNA sequence that encodes the same protein as encoded by any one ofSEQ ID NOs: 10 to
 18. 5. A yeast or fungal cell transfected ortransformed with a DNA sequence selected from any one of the group ofSEQ ID NOs: 10 to 18, or a DNA sequence that encodes the same protein asencoded by any one of SEQ ID NOs: 10 to
 18. 6. A DNA sequence encoding asingle-chain insulin precursor having the formula B-KR-Pur-M-A wherein Aand B represent the A and B polypeptide chains, respectively, ofinsulin, K is lysine, R is arginine, Pur is a purification sequence ofamino acid residues that can be recognized and bound by another moleculefor in vitro separation of an insulin precursor containing the sequencefrom a mixture of molecules, and M is methionine, and cleavage at the KRresidues produces a double-chain insulin precursor Pur-M-A::B, wherein:: represents two disulfide bonds established between the A and Bchains.
 7. The DNA sequence according to claim 6 wherein treatment ofthe double-chain insulin precursor with cyanogen bromide both cleavesoff the Pur sequence and simultaneously produces the mature insulinmolecule A::B.
 8. The DNA sequence according to claim 6 wherein theinsulin is selected from the group consisting of human insulin, porcineinsulin and bovine insulin.
 9. The DNA sequence according to claim 6wherein the purification sequence binds specifically to heparin and/orphosvitin.
 10. A DNA sequence encoding a single-chain insulin precursorhaving the formula B-KR-Y-M-A wherein A and B represent the A and Bpolypeptide chains, respectively, of insulin, K is lysine, R isarginine, Y is an additional biological polypeptide of interest, and Mis methionine, and cleavage at the KR residues produces a double-chaininsulin precursor Y-M-A::B, wherein :: represents two disulfide bondsestablished between the A and B chains.
 11. The DNA sequence accordingto claim 10 wherein treatment of the double chain insulin precursor withcyanogen bromide produces the mature insulin molecule A::B and releasesthe polypeptide Y.
 12. The DNA sequence according to claim 10 wherein Yis a c-myc peptide sequence.
 13. The DNA sequence according to claim 10wherein the insulin is selected from the group consisting of humaninsulin, bovine insulin and porcine insulin.
 14. A DNA sequence encodinga single-chain insulin precursor having the formula B-KR-Y-N-Pur-M-A,wherein A and B represent the A and B polypeptide chains, respectively,of insulin, K is lysine, R is arginine, Y is an additional biologicalpolypeptide of interest, N is methionine or aspartic acid, Pur is apurification sequence of amino acid residues that can be recognized andbound by another molecule for in vitro separation of an insulinprecursor containing the sequence from a mixture of molecules, and M ismethionine, and cleavage at the KR residues produces a double-chaininsulin precursor Y-N-Pur-M-A::B, wherein :: represents two disulfidebonds established between the A and B chains.
 15. The DNA sequenceaccording to claim 14 wherein N is methionine and treatment of thedouble-chain insulin precursor with cyanogen bromide causes cleavage ofthe polypeptide Y from the purification sequence.
 16. The DNA sequenceaccording to claim 14 wherein N is aspartic acid and treatment of thedouble-chain insulin precursor with Pseudomonas fragi mutant Me1endopeptidase causes cleavage of the polypeptide Y from the purificationsequence.
 17. The DNA sequence according to claim 14 wherein Y is ac-myc peptide sequence, and the purification sequence Pur bindsspecifically to heparin and/or phosvitin.
 18. The DNA sequence accordingto claim 14 wherein the insulin is selected from the group consisting ofhuman insulin, bovine insulin and porcine insulin.
 19. A DNA sequenceencoding a single-chain insulin precursor having the formula B-N-X-KR-Awherein A and B represent the A and B polypeptide chains, respectively,of insulin, N is methionine or aspartic acid, K is lysine, R isarginine, and X is a chain of amino acid residues sufficient in lengthto facilitate cleavage in a yeast or fungal host cell at the KRresidues, and cleavage at the KR residues produces a double-chaininsulin precursor A::B-N-X wherein :: represents two disulfide bondsestablished between the A and B chains.
 20. The DNA sequence accordingto claim 19 wherein the chain X of amino acid residues comprises apurification sequence Pur and/or an additional biological polypeptide ofinterest Y, wherein Pur is a sequence of amino acid residues that can berecognized and bound by another molecule for in vitro separation of aninsulin precursor containing the sequence from a mixture of molecules.21. The DNA sequence according to claim 20 wherein Y is a c-myc peptidesequence.
 22. The DNA sequence according to claim 20 wherein thepurification sequence binds specifically to heparin and/or phosvitin.23. The DNA sequence according to claim 19 wherein the insulin isselected from the group consisting of human insulin, bovine insulin andporcine insulin.
 24. A double-chain insulin precursor having the formulaPur-M-A::B wherein A and B represent the A and B polypeptide chains,respectively, of insulin and :: represents two disulfide bondsestablished between the A and B chains, M is methionine, and Pur is apurification sequence of amino acid residues that can be recognized andbound by another molecule for in vitro separation of the insulinprecursor from a mixture of molecules.
 25. The double-chain insulinprecursor according to claim 24 wherein the double-chain precursor issecreted by a yeast or fungus host cell transformed by a recombinant DNAcoding for a single-chain precursor of the double-chain insulinprecursor and a leader peptide.
 26. The double-chain insulin precusoraccording to claim 25 wherein the single-chain precursor has the formulaB-KR-Pur-M-A, wherein K is lysine and R is arginine.
 27. Thedouble-chain insulin precursor according to claim 24 wherein the insulinis selected from the group consisting of human insulin, porcine insulinand bovine insulin.
 28. A double-chain insulin precursor having theformula Y-M-A::B wherein A and B represent the A and B polypeptidechains, respectively, of insulin and :: represents two disulfide bondsestablished between the A and B chains, M is methionine, and Y is anadditional polypeptide of interest.
 29. The double-chain insulinprecursor according to claim 28 wherein the double-chain precursor issecreted by a yeast or fungus host cell transformed by a recombinant DNAcoding for a single-chain precursor of the double-chain insulinprecursor and a leader peptide.
 30. The double-chain insulin precursoraccording to claim 29 wherein the single-chain precursor has the formulaB-KR-Y-M-A, K is lysine and R is arginine.
 31. The double-chain insulinprecursor according to claim 28 wherein the insulin is selected from thegroup consisting of human insulin, porcine insulin and bovine insulin.32. A double-chain insulin precursor having the formula Y-N-Pur-M-A::Bwherein A and B represent the A and B polypeptide chains, respectively,of insulin and :: represents two disulfide bonds established between theA and B chains, M is methionine, N is methionine or aspartic acid, Y isan additional polypeptide of interest, and Pur is a purificationsequence of amino acid residues that can be recognized and bound byanother molecule for in vitro separation of the insulin precursor from amixture of molecules.
 33. The double-chain insulin precursor accordingto claim 32 wherein the double-chain precursor is secreted by a yeast orfungus host cell transformed by a recombinant DNA coding for asingle-chain precursor of the double-chain insulin precursor and aleader peptide.
 34. The double-chain insulin precursor according toclaim 33 wherein the single-chain insulin precursor has the formulaB-KR-Y-N-Pur-M-A, K is lysine and R is arginine.
 35. The double-chaininsulin precursor according to claim 32 wherein the insulin is selectedfrom the group consisting of human insulin, porcine insulin and bovineinsulin.
 36. A double-chain insulin precursor having the formulaA::B-N-X wherein A and B represent the A and B polypeptide chains,respectively, of insulin and :: represents two disulfide bondsestablished between the A and B chains, N is methionine or asparticacid, and X is a chain of amino acid residues that comprises apurification sequence Pur and/or an additional polypeptide of interest,wherein Pur is a sequence of amino acid residues that can be recognizedand bound by another molecule for in vitro separation of an insulinprecursor containing the sequence from a mixture of molecules.
 37. Thedouble-chain insulin precursor according to claim 36 wherein thedouble-chain precursor is secreted by a yeast or fungus host celltransformed by a recombinant DNA coding for a single-chain precursor ofthe double-chain insulin precursor and a leader peptide.
 38. Thedouble-chain insulin precursor according to claim 37 wherein thesingle-chain precursor has the formula B-N-X-KR-A, K is lysine and R isarginine.
 39. The double-chain insulin precursor according to claim 36wherein the insulin is selected from the group consisting of humaninsulin, porcine insulin and bovine insulin.
 40. A method for preparinginsulin from a single-chain protein precursor having the general formulaB-Z-A wherein B and A are the two polypeptide chains representing,respectively, the B- and A- chains of insulin, and Z is a polypeptidecomprising at least one site for proteolytic cleavage in a transformedyeast or fungus host cell, the method comprising the steps of:transforming or transfecting a eukaryotic host cell with an expressionvector expressing a DNA sequence encoding a single-chain proteinprecursor of insulin and comprising any one of SEQ ID NOs: 10 to 18, ora DNA sequence that encodes the same protein as encoded by any one ofSEQ ID NOs: 10 to 18, and a leader peptide which directs the proteinprecursor into a secretion pathway of the host cell; cultivating thetransformed host in a suitable culture medium; recovering from theculture medium a secreted double-chain insulin precursor having theformula -A::B or A::B- or -A::B-, resulting from proteolytic cleavage ofthe single-chain precursor by the host cell, wherein :: represents twodisulfide bonds established between the A and B chains and “—”represents a retained portion of the Z polypeptide on the A and/or the Bchain; and converting the secreted double-chain insulin precursor toinsulin by treatment with a cleaving agent to remove the retainedportions of the Z polypeptide.
 41. A method according to claim 40wherein the single-chain protein precursor is of the general formula(II): B-KR-X-M-A, wherein Z is KR-X-M, K is lysine, R is arginine, M ismethionine, X represents a chain of amino acid residues sufficient inlength to facilitate cleavage of the Z polypeptide in the host at the KRresidues, and the secreted double-chain insulin precursor is X-M-A::B.42. A method according to claim 41, wherein the converting step includescleaving the secreted double-chain insulin precursor at the methionineresidue with cyanogen bromide treatment to remove the X chain of aminoacid residues.
 43. A method according to claim 40 wherein thesingle-chain protein precursor is of the general formula (III):B-KR-Pur-M-A, wherein Z is KR-Pur-M, K is lysine, R is arginine, Pur isa purification sequence of amino acid residues that can be recognizedand bound by another molecule for in vitro separation of the secreteddouble-chain insulin precursor from a mixture of molecules, M ismethionine, and cleavage at the KR residues results in the secreteddouble-chain insulin precursor Pur-M-A::B.
 44. A method according toclaim 43, wherein the recovering step includes recovering the secreteddouble-chain insulin precursor by affinity-chromatography via thepurification sequence Pur, and the converting step includes cleaving therecovered double-chain insulin precursor at the methionine residue withcyanogen bromide treatment.
 45. A method according to claim 40 whereinthe single-chain protein precursor is of the general formula (IV):B-KR-Y-M-A, wherein Z is KR-Y-M, K is lysine, R is arginine, Y is asecond polypeptide of interest included in Z, and M is methionine,wherein cleavage at the KR residues results in the secreted double-chaininsulin precursor Y-M-A::B.
 46. A method according to claim 45, whereinthe converting step includes releasing the second polypeptide Y from thesecreted double-chain insulin precursor by cleaving at the methionineresidue with cyanogen bromide treatment.
 47. A method according to claim40 wherein the single-chain protein precursor is of the general formula(V): B-KR-Y-N-Pur-M-A, wherein Z is KR-Y-N-Pur-M, K is lysine, R isarginine, Y is a second polypeptide of interest included in Z, N ismethionine or aspartic acid, Pur is a purification sequence as definedin claim 43, M is methionine, and cleavage at the KR residues results inthe secreted double-chain insulin precursor Y-N-Pur-M-A::B.
 48. A methodaccording to claim 47, wherein the recovering step includes recoveringthe secreted double-chain insulin precursor by affinity-chromatographyvia the purification sequence Pur, and the converting step includescleaving the recovered double-chain insulin precursor at the methionineresidue M with cyanogen bromide treatment, and releasing the secondpolypeptide Y by cleaving at the methionine residue N.
 49. A methodaccording to claim 40 wherein the single-chain protein precursor is ofthe general formula (VI): B-N-X-KR-A, wherein Z is N-X-KR, N ismethionine or aspartic acid, K is lysine, R is arginine, X is as definedin claim 59, and cleavage at the KR residues results in the secreteddouble-chain insulin precursor A::B-N-X.
 50. A method according to claim49 wherein the X chain comprises at least either a purification sequencePur or a second polypeptide Y, wherein Pur is a sequence of amino acidresidues that can be recognized and bound by another molecule for invitro separation of the secreted double-chain insulin precursor from amixture of molecules.
 51. A method according to claim 49 wherein theconverting step includes cleaving the double-chain insulin precursor atthe aspartic acid residue N by Pseudomonas fragi Me1 endopeptidasetreatment to remove the X chain of amino acid residues.
 52. A methodaccording to claim 40 wherein the codons of the DNA sequence correspondto the most abundant transfer RNA's for each amino acid in the host. 53.A method according to claims 40 wherein the culture medium ismalt-extract-cassamino acids culture medium.